

10/19/99  
JC685 U.S. PTO

Jc525 U.S. PTO  
09/420798  
10/19/99

Attorney Docket No. SON-1661  
Date: October 19, 1999

ASSISTANT COMMISSIONER FOR PATENTS  
Washington, D.C. 20231

Sir:

Transmitted herewith for filing is the patent application of

Inventor: YOSHIHIKO IMAMURA

For: PARALLEL PROCESSOR, PARALLEL PROCESSING METHOD, AND STORING  
MEDIUM

Enclosed are:

- Specification and Claim(s).
- Oath or Declaration (unexecuted).
- Eight sheet(s) of drawings.
- An assignment of the invention to \_\_\_\_\_.
- Copy of one priority application(s).
- Associate Power of Attorney.

The fee has been calculated as shown below:

| CLAIMS AS FILED                               |              |              |                     |                          |
|-----------------------------------------------|--------------|--------------|---------------------|--------------------------|
| FOR                                           | NUMBER FILED | NUMBER EXTRA | RATE                | BASIC FEE<br>\$380/\$760 |
| TOTAL CLAIMS                                  | 22-20        | 2            | X \$ 9<br>\$18      | \$ 36.00                 |
| INDEP. CLAIMS                                 | 7-3          | 4            | X \$39<br>\$78      | \$312.00                 |
| Fee for Multiple Dependent Claims \$130/\$260 |              |              | 0                   |                          |
|                                               |              |              | TOTAL<br>FILING FEE | \$1108.00                |

- A Preliminary Amendment is attached.
- A Verified Statement claiming small entity status is enclosed.
- Charge \$ 1108.00 to Deposit Account No. 18-0013 to cover the filing fee. A duplicate copy of this sheet is enclosed.
- The Commissioner is hereby authorized to charge any fees under 37 C.F.R. 1.16 or 1.17 which may be required during the entire pendency of this application, or to credit any overpayment, to Deposit Account No. 18-0013. A duplicate copy of this sheet is enclosed.
- A check in the amount of \$        cover the filing fee is enclosed.
- Charge \$        to Deposit Account No. 18-0013 to cover the recordal fee. A duplicate copy of this sheet is enclosed.
- Applicant's undersigned attorney may be reached by telephone in our Washington D.C. Office at

(202) 955-3750.

All correspondence should be directed to our below listed address.



\_\_\_\_\_  
Ronald P. Kananen  
Reg. No. 24,104

RADER, FISHMAN & GRAUER, P.L.L.C  
1233 20<sup>th</sup> Street, NW, Suite 501  
Washington, DC 20036  
Telephone: (202) 955-3750  
Facsimile: (202) 955-3751

PARALLEL PROCESSOR, PARALLEL PROCESSING METHOD, AND  
STORING MEDIUM

**BACKGROUND OF THE INVENTION**

5       1.    Field of the Invention

The present invention relates to a parallel processor, parallel processing method, and storing medium for storing the routine of the method in a computer readable format.

10       2.    Description of the Related Art

A single processor running on Unix® or another operating system (OS) must function to manage the progress in a plurality of programs simultaneously existing in a local memory when executing programs under a multi-tasking environment. In such a function, use is made of the concept of a "process" as opposed to the term "program". A "process" is an independent program in execution in a memory space (user memory space) which that program can independently access set in a local memory. Execution of a program means running of a process, while termination of the program means deletion of the process. Also, a process is capable of running and deleting other processes and communicating with other processes.

25       Since there is one central processing unit

(CPU) in a single processor, a maximum of one process can be run at any one time. Therefore, in a single processor, the user memory space is simultaneously assigned to a plurality of independent programs and the plurality of 5 programs are alternately executed in a time sharing mode to alternately run a plurality of processes and thereby realize a multi-tasking environment.

At this time, when one process is in a running state, the other processes are in a waiting 10 state.

In the above multi-tasking environment, a plurality of processes pass messages among each other as described below.

Namely, in a single processor, as explained 15 above, since there is a maximum of one process in a running state at any one time, when one process sending a message is in a running state, another process to receive the message is in a waiting state. Therefore, the running state process sending the message calls up a process 20 management task in a kernel of the OS and writes to send the message in a table in a memory which stores the previous running state of the process to receive the message immediately before it shifted to the waiting state (normally table storing context of threads). Then, 25 when the process to receive the message next shifts the

running state, it learns that the message was received by referring to the table and performs processing in accordance therewith. On the other hand, for example, when a process is one which proceeds to the next 5 processing conditional on receiving a message and judges that no message was received when shifting to the running state and referring to the table, that process enters the waiting state. That process shifts to the running state only after confirming the receipt of a message.

10 On the other hand, for example, in a multiprocessor which is comprised of a plurality of CPUs connected via a common bus and executes a plurality of mutually independent programs in parallel, usually a maximum of one process is in a running state at one CPU 15 at any one time, but a plurality of processes can simultaneously be in the running state at different CPUs.

Communication between processes is achieved for example by a sending side process passing a message over the common bus and an arbiter monitoring the common 20 bus notifying that message to the receiving side process based on instruction codes indicated in the user program (application program). Therefore, to pass a message between processes, it is necessary that both the message sending side process and receiving side process be in the 25 running state.

In this way, in a multiprocessor, usually messages are not passed using the process management task as in the above explained single processor. That is, there is no process management task in a multiprocessor.

5 In a multiprocessor, however, when it is necessary to synchronize a plurality of processes operating in parallel, the synchronization is realized by using the above message passing.

Below, a method of synchronizing processes in  
10 a multiprocessor of the related art will be explained.

First, the configuration of a general multiprocessor will be explained.

Figure 5 is a view of the configuration of a general multiprocessor.

15 As shown in Fig. 5, a multiprocessor 1 is configured by connecting, for example, four processor elements 11<sub>1</sub> to 11<sub>4</sub> via a common bus 17. The common bus 17 is connected to a common memory 15 and an arbiter 16.

Here, the processor element 11<sub>1</sub> comprises,  
20 for example as shown in Fig. 6, a processor core 31 and a local memory 32, stores a user program read from the common memory 15 via the common bus 17 in the local memory 32, and successively supplies instruction codes of the user program stored in the local memory 32 to the processor core 31 for execution. The processor elements

11<sub>2</sub> to 11<sub>4</sub> have the same configuration, for example, as the processor element 11<sub>1</sub>.

The arbiter 16 monitors execution states (such as the load of the processing) of the processor elements 11<sub>1</sub> to 11<sub>4</sub> and assigns software resources stored in the common memory 15 to the processor elements 11<sub>1</sub> to 11<sub>4</sub>, that is, the hardware resources. Specifically, the arbiter 16 reads the user programs stored in the common memory 15 into the local memories 32 shown in Fig. 6 of the processor elements 11<sub>1</sub> to 11<sub>4</sub>.

The arbiter 16, for example as shown in Fig. 7, reads a main program Prg\_A and subprograms Prg\_B, Prg\_C, Prg\_D, and Prg\_E as user programs into the local memories 32 of the processor elements 11<sub>1</sub> to 11<sub>4</sub> indicated by the arrows in Fig. 7 at the same time or at different times.

Next, a method of synchronizing among programs (processes) of the related art in the multiprocessor 1 shown in Fig. 5 will be explained.

First, the main program Prg\_A stored in a common memory 15 is read into the local memory 32 of the processor element 11<sub>1</sub> by the arbiter 16, then, as shown in Fig. 8, instruction codes written in the main program Prg\_A are successively executed in the processor element 11<sub>1</sub>.

Next, when the instruction code "gen(Prg\_B)"

is executed in the processor element 11<sub>1</sub>, a message  
indicating that is notified to the arbiter 16 via the  
common bus 17. Then, the subprogram Prg\_B stored in the  
common memory 15 is read into the local memory 32 of the  
5 processor element 11<sub>2</sub> by the arbiter 16 based on the  
execution states of the processor elements 11<sub>1</sub> to 11<sub>4</sub>,  
and instruction codes written in the subprogram Prg\_B are  
successively executed in the processor element 11<sub>2</sub>.

Next, when an instruction code "gen(Prg\_C)"  
10 is executed in the processor element 11<sub>1</sub>, a message  
indicating that is notified to the arbiter 16 via the  
common bus 17. Then, the subprogram Prg\_C stored in the  
common memory 15 is read into the local memory 32 of the  
processor element 11<sub>3</sub> by the arbiter 16 based on the  
15 execution states of the processor elements 11<sub>1</sub> to 11<sub>4</sub>,  
and instruction codes written in the subprogram Prg\_C are  
successively executed in the processor element 11<sub>3</sub>.

Next, when an instruction code "gen(Prg\_D)"  
is executed in the processor element 11<sub>1</sub>, a message  
20 indicating that is notified to the arbiter 16 via the  
common bus 17. The subprogram Prg\_D stored in the common  
memory 15 is then read into the local memory 32 of the  
processor element 11<sub>4</sub> by the arbiter 16 based on the  
execution states of the processor elements 11<sub>1</sub> to 11<sub>4</sub>,  
25 and instruction codes written in the subprogram Prg\_D are

successively executed in the processor element 11<sub>4</sub>.

Next, when an instruction code "wait(Prg\_D)" is executed in the processor element 11<sub>1</sub>, the processing of the processor element 11<sub>1</sub> enters a synchronization  
5 waiting state.

Next, when the last instruction code "end" of the subprogram Prg\_D is executed in the processor element 11<sub>4</sub>, a message indicating the completion of the subprogram Prg\_D is notified to the processor element 11<sub>1</sub>,  
10 via, for example, the arbiter 16. As a result, the processor element 11<sub>1</sub> releases the synchronization waiting state and executes the next instruction code.

Next, when an instruction code "wait(Prg\_C)" is executed in the processor element 11<sub>1</sub>, the processing of the processor element 11<sub>1</sub> enters a synchronization  
15 waiting state.

Next, when the last instruction code "end" of the subprogram Prg\_C is executed in the processor element 11<sub>3</sub>, a message indicating the completion of the subprogram Prg\_C is notified to the processor element 11<sub>1</sub>,  
20 via, for example, the arbiter 16. As a result, the processor element 11<sub>1</sub> releases the synchronization waiting state and executes the next instruction code.

Next, when an instruction code "gen(Prg\_E)"  
25 is executed in the processor element 11<sub>1</sub>, a message

indicating that is notified to the arbiter 16 via the common bus 17. Then, the subprogram Prg\_E stored in the common memory 15 is read into the local memory of, for example, the processor element 11<sub>4</sub> by the arbiter 16  
5 based on the execution states of the processor elements 11<sub>1</sub> to 11<sub>4</sub>, and instruction codes written in the subprogram Prg\_C are successively executed in the processor element 11<sub>4</sub>.

Next, when the instruction code "gen(Prg\_D)"  
10 is executed again in the processor element 11<sub>1</sub>, a message indicating that is notified to the arbiter 16 via the common bus 17. Then, the subprogram Prg\_D stored in the common memory 15 is read into the local memory 32 of the processor element 11<sub>3</sub> by the arbiter 16 based on the  
15 execution states of the processor elements 11<sub>1</sub> to 11<sub>4</sub>, and instruction codes written in the subprogram Prg\_D are successively executed in the processor element 11<sub>3</sub>.

Summarizing the problems to be solved by the invention, as explained above, in the multiprocessor 1 of  
20 the related art, the synchronization between the programs (processes) executed in different processor elements is a simple one of release of a synchronization waiting state caused by execution of an instruction code "wait" in one processor element based on execution of an instruction  
25 code "end" indicating completion of execution of a

program in another processor element.

Namely, a synchronization waiting state of a processor element based on one program cannot be released until the completion of execution of a program in another processor element. Accordingly, there is a disadvantage that a variety of forms of synchronization among different programs executed at different processor elements such as synchronization among instruction codes written in the middle of programs cannot be realized.

Also, in the above embodiment, the arbiter 16 cannot for example determine which subprogram will be called up in the future by a main program Prg\_A shown in Fig. 8 during execution of the main program Prg\_A by the processor element 11<sub>1</sub>.

Therefore, as shown in Fig. 8, there is a possibility that the subprogram Prg\_D will end up being assigned to different processor elements 11<sub>3</sub> and 11<sub>4</sub> by the arbiter 16 between a first execution and a second execution of the instruction code "gen(Prg\_D)" in a processor element 11<sub>1</sub>. In such a case, although the subprogram Prg\_D is executed again after a relatively short interval, it is necessary to read the subprogram Prg\_D from the common memory 15 to the processor element 11<sub>3</sub> at the time of the second execution, which results in a longer waiting time of the processor element 11<sub>3</sub>.

Such a situation frequently occurs especially when the memory capacity of the local memory shown in Fig. 6 and the size of the program to be read are of the same order and causes a drastic decline of performance of 5 the multiprocessor 1.

#### SUMMARY OF THE INVENTION

An object of the present invention is to provide a parallel processor and a parallel processing method 10 enabling various forms of synchronization among programs executed in parallel and a storing medium for storing the routine of the method in a computer-readable format.

Another object of the present invention is to provide a parallel processor which can shorten a waiting 15 time of a processor element caused by transfer of a user program between a local memory of the processor element and a common memory.

To achieve the above objects, according to a first aspect of the present invention, there is provided a 20 parallel processor comprising a plurality of processing means which perform mutually parallel processing on the basis of instructions written in programs and are capable of communicating with each other via a common bus, wherein one of the processing means suspends processing 25 based on a program and enters a waiting state when

executing a wait instruction and releases the waiting state and restarts the processing based on the program based on execution of a wait release instruction by another processing means and the other processing means

5 executes a next instruction without suspending processing after it executes the wait release instruction.

In the parallel processor according to the first aspect of the present invention, synchronization is established between one processing means and another processing means at the instruction level while both executing programs by using a wait instruction and a wait release instruction in the programs. Namely, it is possible to synchronize among programs without having to wait for completion of execution of one program.

10 15 Preferably, the other processing means executes a synchronization wait instruction to enter a synchronization waiting state and releases the synchronization waiting state based on execution of the wait instruction corresponding to the synchronization wait instruction or execution of a program end instruction indicating an end of a program by the one processing means.

20 25 Due to this, execution of a wait instruction in one processing means prior to execution of a wait release instruction in another processing means can be prevented.

According to a second aspect of the present invention, there is provided a parallel processor comprising a plurality of processing means which perform mutually parallel processing on the basis of instructions written in programs and are capable of communicating with each other via a common bus, wherein one of the processing means suspends processing based on a program and enters a waiting state when executing a wait instruction and releases the waiting state and restarts the processing based on the program based on execution of a wait release instruction by another processing means and the other processing means enters a synchronization waiting state when executing the wait release instruction until the one processing means enters the waiting state when that one processing means is not in the waiting state.

Namely, in the parallel processor according to the second aspect of the present invention, synchronization is established between one processing means and another processing means at an instruction level by using a wait instruction and wait release instruction in the programs.

Namely, it is possible to synchronize among programs without waiting for completion of execution of one program. Also, even if a synchronization wait instruction corresponding to a wait release instruction is not

written in a program to be processed by another processing means, a synchronization waiting state is maintained until the other processing means enters the waiting state when the one processing means is not in a 5 waiting state.

According to a third aspect of the present invention, there is provided a parallel processor comprising a plurality of processing means which perform mutually parallel processing on the basis of instructions 10 written in programs and are capable of communicating with each other via a common bus, comprising a first storage means connected to the common bus for storing the programs and second storage means provided corresponding to the plurality of processing means, reading from the 15 first storage means programs to be executed by corresponding processing means via the common bus, supplying the processing means with instructions written in the read programs, and having faster access speeds than the first storage means; one of the processing means 20 suspending processing based on a program and entering a waiting state when executing a wait instruction and releasing the waiting state and restarting the processing based on the program based on execution of a wait release instruction by another processing means; a second storage 25 means continuing to store a program supplied to its

corresponding processing mean before entering the waiting state when the processing means is in the waiting state.

In the parallel processor according to the third aspect of the present invention, if one processing means 5 suspends its processing based on the program and enters a waiting state when executing a wait instruction and releases the waiting state and resumes the processing based on the program based on execution of a wait release instruction by another processing means, the program 10 supplied to the one processing means is continuously stored in the second storage means corresponding to the one processing means. Namely, when restarting execution of the program, it is not necessary to read the program from the first storage means to the second storage means.

15 According to a fourth aspect of the present invention, there is provided a parallel processing method for performing at least first processing and second processing in parallel based on instructions written in programs, wherein the first processing suspends 20 processing based on a program and enters a waiting state by executing a wait instruction and releases the waiting state and resumes processing based on the program based on execution of a wait release instruction in the second processing and the second processing executes a next 25 instruction without suspending its processing after

executing the wait release instruction.

According to a fifth aspect of the present invention, there is provided a parallel processing method for performing at least first processing and second processing in parallel based on instructions written in programs, wherein the first processing suspends processing based on a program and enters a waiting state by executing a wait instruction and releases the waiting state and resumes processing based on the program based on execution of a wait release instruction in the second processing and the second processing enters a synchronization waiting state by executing the wait release instruction until the first processing enters the waiting state when the first processing is not in the waiting state.

According to a sixth aspect of the present invention, there is provided a storage medium for storing in a computer-readable format routines of first processing and second processing to be performed in parallel based on instructions written in programs, wherein the first processing is processing which suspends processing based on a program and enters a waiting state by executing a wait instruction and releases the waiting state and resumes processing based on the program based on execution of a wait release instruction in the second

processing and the second processing is processing which executes a next instruction without suspending its processing after executing the wait release instruction.

According to a seventh aspect of the present invention, there is provided a storage medium for storing in a computer-readable format routines of first processing and second processing to be performed in parallel based on instructions written in programs, wherein the first processing is processing which suspends processing based on a program and enters a waiting state by executing a wait instruction and releases the waiting state and resumes processing based on the program based on execution of a wait release instruction in the second processing and the second processing is processing which enters a synchronization waiting state by executing the wait release instruction until the first processing enters the waiting state when the first processing is not in the waiting state.

20 BRIEF DESCRIPTION OF THE DRAWINGS

These and other objects and features of the present invention will become clearer from the following description of the preferred embodiments given with reference to the accompanying drawings, in which:

25 Fig. 1 is a view of the configuration of a

multiprocessor according to a first embodiment of the present invention;

Fig. 2 is a view for explaining the operation of the multiprocessor according to the first embodiment of 5 the present invention shown in Fig. 1;

Fig. 3 is a view for explaining the operation of a multiprocessor according to a second embodiment of the present invention;

Fig. 4 is a view for explaining another operation 10 of the multiprocessor according to the second embodiment of the present invention;

Fig. 5 is a view of the configuration of a general multiprocessor;

Fig. 6 is a view of the configuration inside a 15 processor element shown in Figs. 1 and 5;

Fig. 7 is a view for explaining assignment of programs to processor elements of the multiprocessor shown in Fig. 5; and

Fig. 8 is a view for explaining the operation of 20 the general multiprocessor shown in Fig. 5.

#### DESCRIPTION OF THE PREFERRED EMBODIMENTS

Below, preferred embodiments of a multiprocessor according the present invention will be described with 25 reference to the accompanying drawings.

First Embodiment

Figure 1 is a view of the configuration of a multiprocessor 51 according to a first embodiment of the present invention.

5 As shown in Fig. 1, a multiprocessor 51 comprises, for example, a common bus 17, processor elements 61<sub>1</sub> to 61<sub>4</sub>, serving as processing means and second storage means, a common memory 65 serving as a first storage means, and an arbiter 66 serving as a program assigning means.

10 The multiprocessor 51 adopts a bus-connected architecture where the processor elements 61<sub>1</sub> to 61<sub>4</sub>, the common memory 65, and the arbiter 66 are mutually connected via, for example, the common bus 17. These components are built in a single semiconductor chip.

15 (Outline of Components)

Points in common and of difference between the multiprocessor 51 and the multiprocessor 1 shown in Fig. 5 will be explained first.

20 The common bus 17 is the same as the common bus 17 in the multiprocessor 1 of the related art shown in Fig. 5.

Also, the processor elements 61<sub>1</sub> to 61<sub>4</sub> are the same as the processor elements 11<sub>1</sub> to 11<sub>4</sub> shown in Fig. 5 in terms of the hardware configuration of each having a processor core 31 and a local memory 32 as shown in Fig.

6, however, operate differently along with execution of  
instruction codes because the processor elements 61<sub>1</sub> to  
61<sub>4</sub> execute new instruction codes for establishing  
synchronization between programs (processes) not provided  
5 in the multiprocessor 1 of the related art.

The memory 65 has the same hardware configuration  
as the common memory 15 shown in Fig. 5, however, stores  
a user program of a different content.

Furthermore, the arbiter 56 is the same as the  
10 arbiter 16 shown in Fig. 5 in the points that it monitors  
execution states (load of processing etc.) of the  
processor elements 51<sub>1</sub> to 51<sub>4</sub> and assigns software  
resources stored in the common memory 55 to the processor  
elements 51<sub>1</sub> to 51<sub>4</sub>, that is, the hardware resources,  
15 based on the execution states. The arbiter 56 however  
performs processing for establishing various forms of  
synchronization different from that by the arbiter 16  
shown in Fig. 5 and assigns processing of user programs  
to the processor elements 61<sub>1</sub> to 61<sub>4</sub> when the above new  
20 instruction codes are executed by the processor elements  
61<sub>1</sub> to 61<sub>4</sub>.

(Instruction Codes)

The multiprocessor 51 adopts the instruction codes  
explained below.

25 Specifically, the multiprocessor 51 has as

instruction codes: "gen(user program name)" as a program execution instruction, "wait(user program name)" as a synchronization wait instruction, "cont(user program name)" as a wait release instruction, "sleep" as a wait instruction, and "end" as a program end instruction. Note that the multiprocessor 51 further has the variety of instruction codes, for example, provided in general-purpose multiprocessors.

Here, the instruction codes "gen(user program name)", "wait(user program name)", and "end" are the same as the instruction codes having the identical names adopted in the multiprocessor 1 shown in Fig. 5, while the instruction codes "cont(user program name)" and "sleep" are first adopted in the multiprocessor 51.

Note that the instruction codes "gen(user program name)", "wait(user program name)", and "cont(user program name)" use user program names as arguments. Note that there may be any number of arguments.

The instruction code "gen(user program name)" is for instructing one of the other processor elements 61<sub>1</sub> to 61<sub>4</sub> to start executing a user program specified by the argument (user program name).

The instruction code "wait(user program name)" is for instructing an element to wait for synchronization until a user program specified by the argument (user

program name) executes an instruction code "sleep" or "end".

The instruction code "cont(user program name)" is for instructing one of the other processor elements 61<sub>1</sub> to 61<sub>4</sub> executing a user program specified by the argument (user program name) to release the waiting state when in the waiting state.

The instruction code "sleep" is for instructing elements to temporarily stop the execution of a user program and enter the waiting state.

Note that the above instruction codes may be written in the user program when the programmer prepares the user program or may be automatically inserted in accordance with need by a compiler.

15 (Details of Components)

The processor element 61<sub>1</sub> has, as shown in Fig. 5, a processor core 31 serving as a processing means and a local memory 32 serving as a second storage means.

Note that in the present embodiment, a case will be explained where the processor elements 61<sub>1</sub> to 61<sub>4</sub> have the same configuration as each other. In the present invention, however, the processor elements 61<sub>1</sub> to 61<sub>4</sub> are not necessarily the same in configuration. For example, the execution speed of the processor core 31, memory capacity of the local memory 32, etc. may be different.

The local memory 32 stores a user program read from the common memory 15 via the common bus 17.

The processor core 31 successively reads and executes the instruction codes of the user program stored 5 in the local memory 32.

When the instruction code "gen(user program name)" is executed, the processor core 31 outputs an execution instruction of the user program specified by the argument (user program name) to the arbiter via the common bus 17 10 shown in Fig. 1.

When the instruction code "wait(user program name)" is executed, the processor core 31 enters a synchronization waiting state, while when a notice indicating that the instruction code "end" or "sleep" of 15 the user program specified by the argument (user program name) is executed is input from the arbiter 66 via the common bus 17 shown in Fig. 1, the synchronization waiting state is released and the next instruction code is executed.

20 When the processor core 31 executes the instruction code "cont(user program name)", it outputs an instruction to release the state of waiting for execution of the user program specified by the argument (user program name) to the arbiter 66 via the common bus 17 and executes the 25 next instruction code without suspending the processing.

When the processor core 31 executes the instruction code "sleep", it notifies the arbiter 66 that the instruction code is executed via the common bus 17 and simultaneously enters a waiting state. When an 5 instruction to release the waiting state is input from the arbiter 66, the processor core 31 releases the waiting state and executes the next instruction code.

Also, when the processor core 31 executes the instruction code "end", it notifies the arbiter 66 that 10 the instruction code is executed via the common bus 17 and simultaneously ends the execution of the program.

The common memory 65 stores, for example, user programs Prg\_a, Prg\_b, Prg\_c, Prg\_d, and Prg\_e writing various instruction codes including the above "gen(user 15 program name)", "wait(user program name)", "cont(user program name)", "sleep", and "end".

When an execution instruction of a user program is input from one of the processor elements 61<sub>1</sub> to 61<sub>4</sub> via the common bus 17, the arbiter 66 reads the user program 20 specified by the execution instruction from the common memory 65 to, for example, the local memory 32 of the one of the processor elements 61<sub>1</sub> to 61<sub>4</sub> having the smallest load based on the execution states of the processor elements 61<sub>1</sub> to 61<sub>4</sub>.

25 Also, when an instruction to release the waiting

state of execution of the user program is input, the arbiter 66 outputs an instruction indicating to release the waiting state to the one of the processor elements 61<sub>1</sub> to 61<sub>4</sub>, executing the user program specified by the 5 instruction via the common bus 17.

When a notice indicating that the instruction codes "sleep" and "end" were executed is input from one of the processor elements 61<sub>1</sub> to 61<sub>4</sub>, the arbiter 66 notifies the one of the processor elements 61<sub>1</sub> to 61<sub>4</sub>, which output 10 the instruction for executing the program including the instruction codes that the notice was input.

Next, the operation of the multiprocessor 51 will be explained while tracing the process of execution of user programs in the processor elements 61<sub>1</sub> to 61<sub>4</sub> of the 15 multiprocessor 51 shown in Fig. 1.

Here, as shown in Fig. 2, a case is illustrated where user programs Prg\_a, Prg\_b, Prg\_c, Prg\_d, and Prg\_e are executed in the processor elements 61<sub>1</sub>, 61<sub>2</sub>, 61<sub>3</sub>, and 61<sub>4</sub>.

20 Note that the user programs Prg\_a, Prg\_b, Prg\_c, Prg\_d, and Prg\_e are read to the common bus 17 shown in Fig. 1 from a computer-readable storage medium such as a magnetic disk, magnetic tape, optical disk, or magneto-optic disk.

25 First, the arbiter 66 reads the user program Prg\_a

shown in Fig. 2 from the common memory 66 into the local memory 32 of the processor element 61<sub>1</sub>.

Then, instruction codes written in the user program Prg\_a are successively executed in the processor core 31 of the processor element 61<sub>1</sub>.  
5

Specifically, an instruction code "gen(Prg\_b)" is executed first in the processor core 31 of the processor element 61<sub>1</sub>, then an execution instruction of the user program Prg\_b specified by the argument of the  
10 instruction code is output to the arbiter 66 via the common bus 17 shown in Fig. 1.

Then, the arbiter 66 reads the user program Prg\_b from the common memory 65 into the local memory 32 of the processor element 61<sub>2</sub> via the common bus 17. Then,  
15 instruction codes written in the user program Prg\_b stored in the local memory 32 are successively read and executed in the processor element 61<sub>2</sub>.

Next, the processor core 31 of the processor element 61<sub>1</sub> successively executes instruction codes "gen(Prg\_c)" and "gen(Prg\_d)". Through similar processing as in the above instruction code "gen(Prg\_a)", the processor cores 31 of the processor element 61<sub>3</sub> and 61<sub>4</sub> respectively start to execute the user programs Prg\_c and Prg\_d.  
20

25 Next, the processor core 31 of the processor

element 61<sub>1</sub> executes an instruction code "wait(Prg\_d)" and enters a synchronization waiting state.

Then, the processor element 61<sub>4</sub> executes an instruction code "sleep" written in the user program 5 Prg\_d and enters a waiting state.

Also, the arbiter 66 is notified via the common bus 17 that the instruction code "sleep" was executed in the processor element 61<sub>4</sub>. When the notice is input to the arbiter 66, the arbiter 66 notifies that the notice was 10 input to the processor element 61<sub>1</sub> which output the instruction to execute the user program Prg\_d.

Then, when the processor element 61<sub>1</sub> receives as an input the notice from the arbiter 66, the synchronization waiting state is released and the next instruction code 15 is executed in the processor element 61<sub>1</sub>.

Next, the processor core 31 of the processor element 61<sub>1</sub> executes an instruction code "wait(Prg\_c)" and enters a synchronization waiting state.

Then, an instruction code "end" written at the end 20 of the user program Prg\_c is executed in the processor element 61<sub>3</sub>, whereupon the execution of the user program Prg\_c by the processor element 61<sub>3</sub> is ended.

Also, the arbiter 66 is notified via the common bus 17 that the instruction code "end" was executed in the 25 processor element 61<sub>3</sub>.

When the notice is input to the arbiter 66, the arbiter 66 notifies the processor element 61<sub>1</sub> which output the instruction to execute the user program Prg\_c that the notice was input.

5 When the notice is input to the processor element 61<sub>1</sub> from the arbiter 66, the synchronization waiting state is released and the next instruction code is executed in the processor element 61<sub>1</sub>.

Note that when the notice indicating the  
10 instruction code "end" was executed in the processor element 61<sub>3</sub>, the arbiter 66 judges that the load on the processor element 61<sub>3</sub> is lifted and frees the local memory 32 of the processor element 61<sub>3</sub>.

Next, the processor core 31 of the processor  
15 element 61<sub>1</sub> executes the instruction code "gen(Prg\_e)", and, through similar processing as in the case of the above instruction code "gen(Prg\_a)", the user program Prg\_e is read to the local memory 32 of the processor element 61<sub>3</sub>, which is freed as explained above and the  
20 processor core 31 of the processor element 61<sub>3</sub> executes the user program Prg\_e.

Next, the processor element 61<sub>1</sub> executes the instruction code "cont(Prg\_d)" and outputs an instruction to release the waiting state for executing the user  
25 program Prg\_d to the arbiter 66 via the common bus 17.

Then, the arbiter 66 outputs via the common bus 17 the instruction to release the waiting state to the processor element 61<sub>4</sub> executing the user program Prg\_d.

When the processor element 61<sub>4</sub> receives as an input 5 the instruction from the arbiter 66, the waiting state is released and the next instruction code is executed in the processor element 61<sub>4</sub>.

Note that after executing the instruction code "cont(Prg\_d)", the processor element 61<sub>1</sub> executes the 10 next instruction code without suspending the processing.

Next, the processor core 31 of the processor element 61<sub>1</sub> executes the instruction code "wait(Prg\_d)" and enters a synchronization waiting state.

Then, an instruction code "end" written at the end 15 of the user program Prg\_d is executed in the processor element 61<sub>4</sub>, whereupon the execution of the user program Prg\_d in the processor element 61<sub>4</sub> is ended.

Also, the arbiter 66 is notified via the common bus 17 that the instruction code "end" was executed in the 20 processor element 61<sub>4</sub>. When the notice is input to the arbiter 66, it is notified from the arbiter 66 to the processor element 61<sub>1</sub> which output the instruction to execute the user program Prg\_d that the notice was input.

When the processor element 61<sub>1</sub> receives as an input 25 the notice from the arbiter 66, the synchronization

waiting state is released and the next instruction code is executed in the processor element 61<sub>1</sub>.

Note that when the arbiter 66 receives as an input the notice indicating that the instruction code "end" was 5 executed in the processor element 61<sub>4</sub>, as explained above, the arbiter 66 judges that the load on the processor element 61<sub>4</sub> was lifted and frees the local memory 32 of the processor element 61<sub>4</sub>.

Next, the processor core 31 of the processor 10 element 61<sub>1</sub> executes the instruction code "wait(Prg\_e)" and enters a synchronization waiting state.

Then the instruction code "end" written at the end of the user program Prg\_e is executed in the processor element 61<sub>3</sub>, whereupon the execution of the user program 15 Prg\_e in the processor element 61<sub>3</sub> is ended.

Also, the arbiter 66 is notified via the common bus 17 that the instruction code "end" is executed in the processor element 61<sub>3</sub>. When the notice is input to the arbiter 66, the arbiter notifies the processor element 20 61<sub>1</sub> which output the instruction to execute the user program Prg\_e that the notice is input.

When the processor element 61<sub>1</sub> receives as an input the notice from the arbiter 66, the synchronization waiting state is released and the next instruction code 25 is executed in the processor element 61<sub>1</sub>.

Note that when the arbiter 66 receives as an input the notice indicating that the instruction code "end" is executed in the processor element 61<sub>3</sub>, as explained above, the arbiter judges that the load of the processor element 61<sub>3</sub> is lifted and frees the local memory 32 of the processor element 61<sub>3</sub>.

As explained above, according to the multiprocessor 51, by using the instruction code "sleep" to instruct a waiting state for execution of the program and the 10 instruction code "cont" to release the waiting state in addition to the instruction code "end" to indicate an end of the program, it is possible to establish synchronization between instruction codes of programs being executed in different processor elements 61<sub>1</sub> to 15 61<sub>4</sub>. Therefore, according to the multiprocessor 51, it can be made possible to perform a variety of processings based on programs written to establish synchronization between instruction codes.

Namely, in the same way as with the multiprocessor 20 1 of the related art, it becomes possible to establish synchronization between user programs without an end of execution of the user program by an instruction code "end".

Also, according to the multiprocessor 51, as shown 25 in Fig. 2, since an instruction code "wait(Prg\_d)" is

written prior to an instruction code "cont(Prg\_d)" in the user program Prg\_a, it is possible to prevent an instruction code "cont(Prg\_d)" from being executed prior to an execution of an instruction code "sleep" of the 5 user program Prg\_d. As a result, the waiting state of the processor element 61<sub>4</sub> due to the instruction code "sleep" is reliably released by the execution of the instruction code "cont(Prg\_d)" in the processor element 61<sub>1</sub>.

Also, according to the multiprocessor 51, when an 10 instruction code "sleep" is executed, by inputting to the arbiter 66 a notice indicating that the instruction code "sleep" is executed, it becomes possible for the arbiter 66 to know that the execution of a user program including the instruction code "sleep" is to be resumed in the 15 future. Therefore, the arbiter 66 can prevent the user program from being switched with another user program and the number of operations to read the user program is reduced, so the processing time can be made shorter.

Specifically, in the example shown in Fig. 2, it is 20 sufficient to read only once the user program Prg\_d from the common memory 65 to the local memory 32 of the processor element 61<sub>4</sub>, so the waiting time for the processor element 61<sub>4</sub> to read the user program Prg\_d can be made shorter. Also, when resuming the execution of the 25 user program Prg\_d, this is instantly notified to the

processor element 61<sub>4</sub> by the execution of the instruction code "cont(Prg\_d)" in the processor element 61<sub>1</sub>. This is effective especially when executing a user program requiring real time characteristics in which high speed 5 response is required.

Namely, it is possible to prevent needless operation of the processor element 11<sub>1</sub> reading a user program Prg\_D to the local memory 32 when executing the instruction code "gen(Prg\_D)" for the second time as 10 explained using Fig. 8.

The effect of the prevention of needless reading of the user program from the common memory 65 to the local memory 32 is especially remarkable when the memory capacity of the local memory 32 and the size of the user 15 program to be read are of the same order.

#### Second Embodiment

The multiprocessor of the second embodiment is basically the same as the multiprocessor 51 of the first embodiment. The point of difference from the 20 multiprocessor 51, however, is that it uses an instruction code "cont\_a" as a wait release instruction, which will be explained below, instead of using the instruction code "cont" of the first embodiment.

Namely, when execution of an instruction code 25 "cont" in one processor element comes later than

execution of an instruction code "sleep" corresponding to the instruction code "cont" in other processor elements, the instruction code "cont" of the above first embodiment cannot release the waiting state of the other processor  
5 elements due to the instruction code "sleep". In order to prevent such a situation, in the first embodiment, for example as shown in Fig. 2, it was necessary to write in the user program Prg\_a an instruction code "wait(Prg\_d)" and release the synchronization waiting state by  
10 execution of an instruction code "sleep" of the user program Prg\_d prior to the writing of an instruction code "cont(Prg\_d)".

In the present embodiment, by using a new instruction code "cont\_a" instead of the instruction code  
15 "cont" of the first embodiment, the writing of an instruction code "wait" prior to that is made unnecessary.

The instruction code "cont\_a" designates a user program name as an argument in the same way as the  
20 instruction code "cont". Namely, the instruction format becomes "cont\_a(user program name)".

Also, when the "cont\_a(user program name)" is executed, processor cores 31 of the processor elements  
61<sub>1</sub> to 61<sub>4</sub> output instructions to release the waiting  
25 state for executing the user program specified by the

argument (user program name) to the arbiter 66 via the common bus 17 shown in Fig. 1.

When the instruction code "cont\_a(user program name)" is executed, the processor cores 31 execute the 5 next instruction codes without suspending the processing.

Also, the processor elements 61<sub>1</sub> to 61<sub>4</sub>, after receiving as input an inverse synchronization waiting instruction from the arbiter 66, enter an inverse synchronization waiting state until an instruction to 10 release the state is input from the arbiter 66.

When the instruction to release a waiting state for execution of a user program is input, the arbiter 66 judges whether the user program is in the waiting state. When it is judged to be in the waiting state, the arbiter 15 66 outputs to the one of the processor elements 61<sub>1</sub> to 61<sub>4</sub> assigned the user program in the waiting state an instruction to release the waiting state via the common bus 17.

On the other hand, when it is judged in the above 20 judgement that the user program is not in the waiting state, the arbiter 66 outputs an inverse synchronization waiting instruction to the one of the processor elements 61<sub>1</sub> to 61<sub>4</sub> which output the instruction to release the waiting state for the execution of the user program. 25 Then, when the notice that an instruction code "sleep"

was executed is received as an input, the arbiter 66 outputs to the one of the processor elements 61<sub>1</sub> to 61<sub>4</sub> in the above inverse synchronization waiting state an instruction indicating to release the state and, at the 5 same time, outputs to the one of the processor elements 61<sub>1</sub> to 61<sub>4</sub> which executed the instruction code "sleep" via the common bus 17 an instruction to release the waiting state.

Below, the operation of the multiprocessor of the 10 present embodiment will be explained by tracing the process of execution of user programs in the processor elements 61<sub>1</sub> to 61<sub>4</sub> shown in Fig 1.

Here, as shown in Fig. 3, a case will be explained where user programs Prg\_aa, Prg\_b, Prg\_c, Prg\_d, and 15 Prg\_e are executed in the processor elements 61<sub>1</sub> to 61<sub>4</sub> shown in Fig. 1.

The user program Prg\_aa shown in Fig. 3 is different from the above user program Prg\_a in the point that an instruction code "cont\_a(Prg\_d)" is written 20 instead of the instruction code "cont(Prg\_d)" and the instruction "wait(Prg\_d)" prior to that shown in Fig. 2. The user programs Prg\_b, Prg\_c, Prg\_d, and Prg\_e are respectively the same as the user programs Prg\_b, Prg\_c, Prg\_d, and Prg\_e shown in Fig. 2.

25 First, as shown in Fig. 3, the processor element

61<sub>1</sub> successively executes instruction codes "gen(Prg\_b)", "gen(Prg\_c)", "gen(Prg\_d)", and "wait(Prg\_c)" written in the user program Prg\_aa.

The processing is the same as that of the above 5 instruction codes having identical names explained by using Fig. 2.

When a synchronization waiting state due to the instruction code "wait(Prg\_c)" is released, the 10 instruction code "cont\_a(Prg\_d)" is executed in the processor core 31 of the processor element 61<sub>1</sub>.

As a result, an instruction to release the waiting state for execution of the user program Prg\_d is output from the processor element 61<sub>1</sub> to the arbiter 66 via the common bus 17 shown in Fig. 1.

15 Then, it is judged in the arbiter 66 whether the user program Prg\_d being executed in the processor element 61<sub>4</sub> is in the waiting state or not. Since it is not in the waiting state in this case, an inverse synchronization waiting instruction is output to the 20 processor element 61<sub>1</sub>.

Then the processor element 61<sub>1</sub> which received as input the inverse synchronization waiting instruction enters an inverse synchronization waiting state.

25 Then the instruction code "sleep" of the user program Prg\_d is executed in the processor element 61<sub>4</sub>.

and a notice indicating that the instruction code "sleep" is executed is output from the processor element 61<sub>4</sub> to the arbiter 66. Then based on the notice, an instruction to release the inverse synchronization waiting state is 5 output to the processor element 61<sub>1</sub> from the arbiter 66 via the common bus 17 and the state is released in the processor element 61<sub>1</sub>.

Next, the processor element 61<sub>1</sub> executes instruction codes "gen(Prg\_e)", "wait(Prg\_d)", and 10 "wait(Prg\_e)" of the user program Prg\_aa. The processing of the executions is the same as the case explained above by using Fig. 2.

Note, for example as shown in Fig. 4, when the instruction code "sleep" in the processor element 61<sub>4</sub> is 15 executed at a timing earlier than execution of the instruction code "cont\_a(Prg\_d)" in the processor element 61<sub>1</sub>, the processor element 61<sub>4</sub> enters a waiting state after executing the instruction code "sleep". When an instruction to release the waiting state based on the 20 execution of the instruction code "cont\_a(Prg\_d)" in the processor element 61<sub>1</sub> is input from the arbiter 66, the waiting state is released in the processor element 61<sub>4</sub>.

As explained above, according to the multiprocessor of the present embodiment, by using the above instruction 25 code "cont\_a", it becomes unnecessary, for example, to

write the instruction code "wait" by which the synchronization waiting state is released by an execution of the instruction code "sleep" prior to the writing of the instruction code "cont" in the same way as in the 5 first embodiment shown in Fig. 2. Namely, in the case shown in Fig. 2, even when the execution of the instruction code "sleep" of the user program Prg\_d is carried out at a timing prior to that of the instruction code "cont\_a(Prg\_d)", the processor element 61<sub>1</sub> enters an 10 inverse synchronization waiting state and synchronization between the user programs Prg\_aa and Prg\_d is guaranteed.

As a result, a programmer can write the user program Prg\_aa without considering the execution timing of the instruction code "sleep" of the user program Prg\_d 15 so the work load can be reduced.

The present invention is not limited to the above embodiments.

For example, in the above embodiments, a case was explained where instructions to release the waiting 20 state, output by the processor elements 61<sub>1</sub> to 61<sub>2</sub>, in accordance with the execution of the instruction codes "cont" and "cont\_a", are output to processor elements 61<sub>1</sub> to 61<sub>4</sub> executing user programs in the waiting state via the arbiter 66, however, the processor elements 61<sub>1</sub> to 25 61<sub>4</sub> executing user programs in the waiting state may

monitor the common bus 17, so instructions to release the waiting state need not be input via the arbiter 66.

Also, similarly, a notice indicating that the instruction codes "sleep" and "end" output from the 5 processor elements 61<sub>1</sub> to 61<sub>4</sub>, may be directly input to the corresponding processor elements 61<sub>1</sub> to 61<sub>4</sub> monitoring the common bus 17, that is, not via the arbiter 66.

Also, in the above embodiments, a case was 10 explained where the instruction codes "gen", "cont", and "cont\_a" are written only in the user programs Prg\_a and Prg\_aa. However, the instruction codes may be written in the user programs Prg\_b, Prg\_c, and Prg\_d as well.

Also, the instruction code "sleep" may be written 15 in a plurality of user programs.

Further, in the above embodiments, an example of a multiprocessor having four processor elements of identical configurations was shown. However, any number of the processor elements may be used if more than two 20 and the configuration of the plurality of processor elements may be different.

Also in the above embodiment, a case was explained where the components shown in Fig. 1 were provided on the same semiconductor chip. The present invention, however, 25 can also be applied to a distributed processing system

wherein, for example, the processor elements 61<sub>1</sub> to 61<sub>4</sub> shown in Fig. 1 are provided in different computers connected by a network.

Summarizing the effects of the invention, as  
5 explained above, according to the parallel processor of  
the present invention, various forms of synchronization  
can be established among programs executed in parallel in  
a plurality of processing means.

Also, according to the parallel processing method  
10 and storage medium of the present invention, various  
forms of synchronization can be established among a  
plurality of processings carried out in parallel.

Also, according to the parallel processor of the present invention, by executing a waiting instruction by a processing means, it is possible to control the reading of a program from a first storage means to a second storage means corresponding to the processing means and to reduce the number of read operations of the program from the first storage means to the second storage means and thereby shorten the waiting time of the processing means.

While the invention has been described with reference to specific embodiment chosen for purpose of illustration, it should be apparent that numerous modifications could be made thereto by those skilled in

the art without departing from the basic concept and scope of the invention.

What is claimed is:

1. A parallel processor comprising a plurality of processing means which perform mutually parallel processing on the basis of instructions written in programs and are capable of communicating with each other via a common bus, wherein
  - one of said processing means suspends processing based on a said program and enters a waiting state when executing a wait instruction ("sleep") and releases said waiting state and restarts the processing based on said program based on execution of a wait release instruction ("cont") by another processing means and
    - the other processing means executes a next instruction without suspending processing after it executes said wait release instruction.
2. A parallel processor as set forth in claim 1, wherein said other processing means executes a synchronization wait instruction ("wait") to enter a synchronization waiting state and releases said synchronization waiting state based on execution of said wait instruction corresponding to said synchronization wait instruction or execution of a program end instruction ("end") indicating an end of a program by said one processing means.

3. A parallel processor as set forth in claim 1,  
further comprising:

a first storage means connected to said  
common bus and storing said programs and

5 second storage means provided corresponding  
to said plurality of processing means, reading from said  
first storage means programs to be executed by  
corresponding processing means via said common bus,  
supplying said processing means with instructions written  
10 in the read programs, and having faster access speeds  
than said first storage means,

a said second storage means continuing to  
store a program supplied to its corresponding processing  
mean before entering said waiting state when said  
15 processing means is in said waiting state.

4. A parallel processor as set forth in claim 3,  
wherein a said second storage means continues to store a  
program until its corresponding processing means executes  
a program end instruction indicating an end of a program.

20 5. A parallel processor as set forth in claim 3,  
wherein:

said other processing means executes a  
program execution instruction ("gen") to make said one  
processing means execute a program stored in said first  
25 storage means; and

SEARCHED - SERIALIZED - INDEXED - FILED

a said second storage means corresponding to said one processing means reads and stores a program which is instructed by said program execution instruction to be executed from said first storage means.

5        6.        A parallel processor as set forth in claim 5, further comprising a program execution assigning means for determining said one processing means to execute a program instructed to be executed by said program execution instruction and for reading the program 10 instructed to be executed by said program execution instruction from said first storage means to said second storage means corresponding to said determined processing means.

15        7.        A parallel processor as set forth in claim 5, wherein when said one processing means enters a waiting state based on said wait instruction, said other processing means which executed said program execution instruction executes said wait release instruction.

20        8.        A parallel processor as set forth in claim 5, wherein when said one processing means enters a waiting state based on said wait instruction, a processing means other than said other processing means which executed said program execution instruction executes said waiting release instruction.

25        9.        A parallel processor as set forth in claim 1,

wherein said plurality of processing means and a common bus for connecting the plurality of processing means are installed in a single semiconductor chip.

10. A parallel processor comprising a plurality  
5 of processing means which perform mutually parallel processing on the basis of instructions written in programs and are capable of communicating with each other via a common bus, wherein

one of said processing means suspends  
10 processing based on a said program and enters a waiting state when executing a wait instruction ("sleep") and releases said waiting state and restarts the processing based on said program based on execution of a wait release instruction ("cont\_a") by another processing  
15 means and

said other processing means enters a synchronization waiting state when executing said wait release instruction until said one processing means enters said waiting state when said one processing means  
20 is not in said waiting state.

11. A parallel processor as set forth in claim  
10, wherein said other processing means executes a next instruction without suspending its processing after executing said wait release instruction when said one  
25 processing means is in said waiting state when it

executes said wait release instruction.

12. A parallel processor as set forth in claim  
10, further comprising:

5 a first storage means connected to said  
common bus for storing said programs;  
second storage means provided corresponding  
to said plurality of processing means, reading from said  
first storage means programs to be executed by  
corresponding processing means via said common bus,  
10 supplying said processing means with instructions written  
in the read programs, and having faster access speeds  
than said first storage means,

15 a said second storage means continuing to  
store a program supplied to its corresponding processing  
mean before entering said waiting state when said  
processing means is in said waiting state.

13. A parallel processor as set forth in claim  
12, wherein a said second storage means continues to  
store a program until its corresponding processing means  
20 executes a program end instruction ("end") indicating an  
end of a program.

14. A parallel processor comprising a plurality  
of processing means which perform mutually parallel  
processing on the basis of instructions written in  
25 programs and are capable of communicating with each other

via a common bus, comprising:

    a first storage means connected to said common bus for storing said programs and

    second storage means provided corresponding

5    to said plurality of processing means, reading from said first storage means programs to be executed by corresponding processing means via said common bus, supplying said processing means with instructions written in the read programs, and having faster access speeds

10    than said first storage means;

    one of said processing means suspending processing based on a said program and entering a waiting state when executing a wait instruction ("sleep") and releasing said waiting state and restarting the

15    processing based on said program based on execution of a wait release instruction ("cont") by another processing means;

    a said second storage means continuing to store a program supplied to its corresponding processing 20    mean before entering said waiting state when said one processing means is in said waiting state.

15.    A parallel processor as set forth in claim 14, wherein a said second storage means continues to store a program until its corresponding processing means 25    executes a program end instruction ("end") indicating an

end of a program.

16. A parallel processor as set forth in claim  
14, wherein:

5        said other processing means executes a  
program execution instruction ("gen") to make said one  
processing means execute a program stored in said first  
storage means; and

10      a said second storage means corresponding to  
said one processing means reads and stores the program  
instructed to be executed by said program execution  
instruction from said first storage means.

15      17. A parallel processor, as set forth in claim  
16, further comprising a program execution assigning  
means for determining said one processing means to  
execute a program instructed to be executed by said  
program execution instruction and for reading the program  
instructed to be executed by said program execution  
instruction from said first storage means to said second  
storage means corresponding to said determined processing  
20      means.

18. A parallel processing method for performing  
at least first processing and second processing in  
parallel based on instructions written in programs,  
wherein:

25      said first processing suspends processing

based on a said program and enters a waiting state by executing a wait instruction ("sleep") and releases said waiting state and resumes processing based on said program based on execution of a wait release instruction

5 ("cont") in said second processing and

      said second processing executes a next instruction without suspending its processing after executing said wait release instruction.

19. A parallel processing method as set forth in  
10 claim 18, wherein said second processing enters a synchronization waiting state by execution of a synchronization wait instruction ("wait") and releases said synchronization waiting state based on execution of said wait instruction corresponding to said  
15 synchronization wait instruction in said first processing.

20. A parallel processing method for performing at least first processing and second processing in parallel based on instructions written in programs,  
20 wherein:

      said first processing suspends processing based on a said program and enters a waiting state by executing a wait instruction ("sleep") and releases said waiting state and resumes processing based on said program based on execution of a wait release instruction

( "cont\_a" ) in the second processing and  
said second processing enters a  
synchronization waiting state by executing said wait  
release instruction until said first processing enters  
5 said waiting state when said first processing is not in  
said waiting state.

21. A storage medium for storing in a computer-  
readable format routines of first processing and second  
processing to be performed in parallel based on  
10 instructions written in programs, wherein:

      said first processing is processing which  
      suspends processing based on a said program and enters a  
      waiting state by executing a wait instruction ( "wait" )  
      and releases said waiting state and resumes processing  
15 based on said program based on execution of a wait  
      release instruction ( "cont" ) in said second processing  
      and

      said second processing is processing which  
      executes a next instruction without suspending its  
20 processing after executing said wait release instruction.

22. A storage medium for storing in a computer-  
readable format routines of first processing and second  
processing to be performed in parallel based on  
      instructions written in programs, wherein:

25       said first processing is processing which

suspends processing based on a said program and enters a waiting state by executing a wait instruction ("sleep") and releases said waiting state and resumes processing based on said program based on execution of a wait

- 5 release instruction ("cont\_a") in the second processing and

said second processing is processing which enters a synchronization waiting state by executing said wait release instruction until said first processing

- 10 enters said waiting state when said first processing is not in said waiting state.

PARALLEL PROCESSOR, PARALLEL PROCESSING METHOD AND  
STORAGE MEDIUM

5

ABSTRACT OF THE DISCLOSURE

A parallel processor capable of establishing synchronization among programs executed in parallel, wherein a processor element suspends its processing and 10 enters a waiting state when a wait instruction "sleep" is executed in a user program Prg\_d and resumes the processing by releasing the above waiting state based on execution of a wait release instruction "cont(Prg\_d)" by another processor element and wherein the latter 15 processor element executes a next instruction without suspending its processing after executing the wait release instruction "cont(Prg\_d)".

FIG.1



FIG.2



〔sleep〕 : WAIT INSTRUCTION  
 〔cont〕 : WAIT RELEASE INSTRUCTION  
 〔wait〕 : SYNCHRONIZATION WAIT INSTRUCTION  
 〔gen〕 : PROGRAM EXECUTION INSTRUCTION  
 〔end〕 : PROGRAM END INSTRUCTION

FIG.3



# FIG.4



**FIG.5**



FIG.6



FIG.7



FIG.8



「end」: PROGRAM END INSTRUCTION

**DECLARATION AND POWER OF ATTORNEY FOR PATENT APPLICATION**  
**English Language Declaration**

As below named inventors, we hereby declare that:

Our residence, post office address and citizenship are as stated below next to our names.

PARALLEL PROCESSOR, PARALLEL PROCESSING METHOD, AND STORING MEDIUM

We believe we are the original, first and joint inventors of the subject matter which is claimed and for which a patent is sought on the invention entitled

the specification of which

(check one)

is attached hereto.

was filed on \_\_\_\_\_ as

Application Serial No. \_\_\_\_\_

and was amended on \_\_\_\_\_

I hereby state that I have reviewed and understand the contents of the above identified specification, including the claims, as amended by any amendment referred to above.

I acknowledge the duty to disclose information which is material to patentability as defined in Title 37, Code of Federal Regulations, §1.56.

I hereby claim foreign priority benefits under Title 35, United States Code, §119 of any foreign application(s) for patent of inventor's certificate listed below and have also identified below any foreign application for patent or inventor's certificate having a filing date before that of the application on which priority is claimed:

Prior Foreign Application(s) Priority Claimed

|            |           |                        |        |
|------------|-----------|------------------------|--------|
| P10-302679 | JAPAN     | 23/10/1998             | X      |
| (Number)   | (Country) | (Day/Month/Year Filed) | Yes No |

|          |           |                        |        |
|----------|-----------|------------------------|--------|
| (Number) | (Country) | (Day/Month/Year Filed) | Yes No |
|----------|-----------|------------------------|--------|

|          |           |                        |        |
|----------|-----------|------------------------|--------|
| (Number) | (Country) | (Day/Month/Year Filed) | Yes No |
|----------|-----------|------------------------|--------|

We hereby claim the benefit under Title 35, United States Code, §120 of any United States application(s) listed below and insofar as the subject matter of each of the claims of this application is not disclosed in the prior United States application in the manner provided by the first paragraph of Title 35, United States Code §112, I acknowledge the duty to disclose material to patentability as defined in Title 37, Code of Federal Regulations, §1.56 and 1.63(d) which became available between the filing date of the prior application and the national or PCT international filing date of this application:

|                                |               |          |
|--------------------------------|---------------|----------|
| (Application Serial No.)       | (Filing Date) | (Status) |
| (patented, pending, abandoned) |               |          |

We hereby declare that all statements made herein of our own knowledge are true and that all statements made on information and belief are believed to be true, and further that these statements were made with the knowledge that willful false statements and the like so made are punishable by fine or imprisonment, or both, under Section 1001 of Title 18 of the United States Code and that such willful false statements may jeopardize the validity of the application or any patent issued thereon.

## English Language Declaration

POWER OF ATTORNEY: As a named inventor, I hereby appoint the following attorney(s) and/or agent(s) to prosecute this application and transact all business in the Patent and Trademark Office connected therewith.

Ronald P. Kananen, Reg. No. 24,104; Ralph T. Rader, Reg. No. 28,772; Michael D. Fishman, Reg. No. 31,951, Richard D. Grauer, Reg. No. 22,388; Joseph V. Coppola, Sr., Reg. No. 33,373; Michael B. Stewart, Reg. No. 36,018; Steven L. Nichols, Reg. No. 40,326; Christopher M. Tanner, Reg. No. 41,518

Send Correspondence to:

Direct telephone calls to:

Ronald P. Kananen, Esq.  
RADER, FISHMAN & GRAUER  
The Lion Building  
1233 20<sup>th</sup> Street, N.W., Suite 501  
Washington, D.C. 20036

Ronald P. Kananen, Esq.  
(202) 955-3750

Full name of first joint inventor YOSHIHIKO IMAMURA

Inventor's signature

Date

Residence Ibaraki, JAPAN

Citizenship JAPAN

Post Office Address c/o Sony Corporation 7-35, Kitashinagawa, 6-Chome

Shinagawa-Ku, Tokyo, JAPAN

Full name of second joint inventor

Second Inventor's signature

Date

Residence

Citizenship

Post Office Address

Full name of third joint inventor

Third Inventor's signature

Date

Residence

Citizenship

Post Office Address

(Supply similar information and signature for subsequent joint inventors.)